17 Conditional Expectation

.

By Tower principle,E[Y]=E[E[Y|X]]=E[X2]=12E[X]=l4.

1 Conditional Expectation

Assume that fX,Y(x,y) is well-defined for all (x,y)R2.

Conditional expectation

If Y is continuous, then we can define conditional expectationE[Y|X=x]=+yfY|X=x(y)dy.
More generally, for suitable function g:RR, thenE[g(Y)|X=x]=+g(y)fY|X=x(y)dy.
And for suitable function h:R2R,E[h(X,Y)|X=x]=E[h(x,Y)|X=x].

Conditional density fY|X=x is defined in here.

If Y is discrete,E[Y|X=x]=yyP(Y=y|X=x).

  1. For a fixed x, E[Y|X=x] satisfies the usual properties of expectation, e.g., linearity.
  2. E[Y|X=x]=ψ(x) is a function of x.
  3. E[Y|X]=ψ(X) is a function of random variable X, so itself is a random variable. ψ(X):ΩR,[ψ(X)](ω)=ψ(X(ω)). So it can also be seen as a composition.
Theorem (Law of total expectation/Law of iterated expectation/Tower property)

For any random variable Y, s.t. E[|Y|]<, E[Y]=E[E[Y|X]].

The notation of expectation by default indicates what to integrate. So the inner layer E[Y|X] is expectation over Y, and the outer layer is over X.

Theorem (Wald's Identity)

Suppose X1,X2, is a sequence of i.i.d. random variables, with E[Xi]=μ<, and N is another positive integer valued random variable, s.t. NX1,X2,, and E[N]<.
Let SN=X1++XN. ThenE[SN]=μE[N].

2 Important Applications

2.1 Statistical risk minimization

Y is a random variable of interest (we want to predict). g(X) is a prediction of Y. Loss function is L(Y,g(X)). Risk is R(g)=E[L(Y,g(X))]. It's the expectation over both X and Y.
The goal is to find g=argmaxgR(g).

For Mean Absolute Error ( #MAE ), see below.

Median

For a random variable X, a median of the distribution of X is any value m, s.t.P(Xm)12,P(Xm)12.

  1. Every distribution has at least one median.
  2. Median may not be unique.
Theorem (MAE Minimizer)

Let Z be a random variable with a finite median m. Then, m minimizes h(c)=E[|Zc|].

Back to risk minimization, L(Y,g(X))=|Yg(X)|,R(g)=E[|Yg(X)|].
Any function g s.t. g(x) is a conditional median of Y given X=x minimizes R(g).

3 Conditional Variance

We know Var(Y)=E[Y2](E[Y])2=E[(YE(Y))2].
And conditional variance is defined asVar(Y|X=x)=E[(YE[Y|X=x])2|X=x]=E[Y2|X=x]E2[Y|X=x].

Claim (Law of Total Variance)

Var(Y)=EX[Var(Y|X)]+Var(EY|X[Y|X]).